4  Multiple Linear Regression (MLR)

4.1 Introduction

Multiple linear regression (MLR) extends simple linear regression by allowing \(Y\) to depend on more than one predictor variable. This enables richer models and allows us to estimate the partial contribution of each predictor while accounting for others.

When to Extend SLR

SLR is limited to one predictor. MLR becomes appropriate when:

  • Multiple factors plausibly influence the response.
  • Excluding predictors may bias results.
  • You wish to measure the unique contribution of each predictor while controlling for others.

A graphical motivation

Lets consider a simple example:

We might look at each bivariate (i.e. two variables) relationship separately:

However we might also look at the multivariate relationship

in the case of 3 variables we can also extend our visualisation to 3 dimensions:

4.2 The MLR Model

Multiple linear model extends the simple linear model straightforwardly by adding the extra variables to the deterministic linear predictor, each with its own parameter.  

\[ Y = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k + \varepsilon, \quad \varepsilon \sim N(0,\sigma^2) \]

Now, instead of a single predictor \(X\), we may have several (\(k\)) predictors \(x_1, x_2, \ldots, x_k\), each corresponding to a column in our dataset. We use an index to distinguish these predictors, and each predictor \(x_i\) has a corresponding parameter \(\beta_i\) governing its effect on the response. Note that we have updated the intercept parameter \(\alpha\) from the simple linear regression section to \(\beta_0\), because it is simply another parameter in the model!

The random error term, \(\varepsilon\) stays the same, so the assumptions of MLR are very similar to those of SLR:

Assumption: Assumptions of the MLR model
  • A linear predictor, e.g. \(E[Y]=\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k\).
  • Independent and identically distributed random error terms that
    • Have a mean \(\mu=0\), and a constant variance \(Var(\varepsilon)=\sigma^2\)
    • Follow a normal distribution: \(N(0,\sigma^2)\)
Note: Multivariate Linearity

Although this is still a linear model, the term “linear” no longer refers to a literal straight line in two dimensions as it did in simple linear regression. Instead, “linear” means that the model is linear in its parameters: each \(\beta_i\) multiplies a predictor and enters additively. This linear model may live in a high-dimensional predictor space and cannot be visualised as a single line. For example, for the three dimensional data presented above (one outcome: \(Y\) + two predictors: \(X_1\), \(X_2\)), a linear model will instead look like a three dimensional plane

Partial Regression Coefficients

Even though our model is no longer just a straight line, the familiar slope interpretation from simple linear regression still applies to each individual parameter: \(\beta_i\) now describes the expected change in the mean response for a one-unit increase in \(x_i\), holding all other predictors constant.

Example 1

Interpreting a partial coefficient If the fitted model for life expectancy includes Internet usage and BirthRate,

Internet: 0.112
BirthRate: -0.594

Then:

  • A 1% increase in internet usage is associated with an expected 0.11-year increase in life expectancy, holding other predictors fixed.
  • A one‑unit increase in birth rate corresponds to an expected 0.59‑year decrease in life expectancy, controlling for all other predictors.

4.3 Fitting MLR Models in R

MLR uses the same lm() function as SLR.

mod <- lm(LifeExp ~ Population + Health + Internet + BirthRate,
          data = countries.df)
summary(mod)

The coefficient table includes:

  • Estimate (_i)
  • Standard error
  • t‑statistic and p‑value for testing (H_0 : _i = 0)

Individual Coefficient Inference

As in SLR, individual t‑tests assess whether predictors are useful, but now these tests reflect conditional usefulness.

confint(mod)
Exercise 1

Using a provided regression summary:

  1. Identify which predictors are useful at the 5% level.

  2. Interpret the adjusted \(R^2\).

  3. State the conclusion of the global F‑test.

  4. Interpret one coefficient in context.

    :::


4.4 Global Model Usefulness: The F-Test

The global F-test evaluates whether the model as a whole is useful. Rather than looking at each coefficient separately, it asks whether any of the predictors contribute to explaining variation in the response.

We test:

  • \(H_0\): all slope coefficients are zero (no linear relationship between \(Y\) and the predictors),
  • \(H_a\): at least one slope coefficient is non-zero (at least one predictor is useful).

The F-statistic and its p-value are reported in the summary(lm) output. A small p-value (for example, less than 0.05) provides evidence that the model is useful overall.

Example
In the life expectancy example, the output might report

F-statistic: 28.2 on 4 and 44 DF,  p-value: 1.19e-11

The very small p-value indicates strong evidence against \(H_0\). We conclude that at least one of the predictors (Population, Health, Internet, BirthRate) is useful for predicting life expectancy, and that the model is useful overall.

Exercise
Given an F-statistic of 12.3 with p = 0.0004, state the null and alternative hypotheses and conclude whether the model is useful at the 5% level.


4.5 Multicollinearity (Brief Overview)

In multiple regression, some predictors are often correlated with each other. This is known as multicollinearity.

When multicollinearity is present:

  • Predictors share overlapping information about the response.
  • Standard errors of the affected coefficients can become large.
  • Coefficients can appear non-significant even when the variables are important.
  • Coefficient signs may be unstable and difficult to interpret.

A simple descriptive check is to look at correlations between predictors, or a pairs plot. More formal diagnostics (such as variance inflation factors) and strategies for dealing with multicollinearity are covered in Module 6.

Exercise
Suppose two predictors in a regression model are very strongly positively correlated. Briefly explain how this might affect (a) the standard errors of their coefficients and (b) your ability to interpret their partial effects.


4.6 Measures of Model Fit: \(R^2\) and Adjusted \(R^2\)

As in simple linear regression, \(R^2\) measures the proportion of variation in \(Y\) explained by the model.

However, in MLR \(R^2\) always increases (or at least does not decrease) when we add more predictors, even if the new predictors are not truly useful.

Adjusted \(R^2\) is a modified version of \(R^2\) that includes a penalty for the number of predictors. It increases only when a new predictor improves the model more than would be expected by chance.

Use adjusted \(R^2\) when comparing models with different numbers of predictors.


4.7 Summary

This chapter introduced the multiple regression model, the interpretation of partial coefficients, and the core inferential tools used to assess model usefulness. Later modules will extend these ideas to interactions, categorical predictors, model selection, and diagnostic analysis.